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Abstract 

C-reactive protein (CRP) is a general marker of systemic inflammation and cardiovascular disease (CVD). The genetic 
contribution to differences in CRP levels remains to be explained, especially in non-European populations. Thus, the aim of 
this study was to identify genetic loci associated with CRP levels in Korean population. We performed genome-wide 
association studies (GWAS) using SNPs from 8,529 Korean individuals (7,626 for stage 1 and 903 for stage 2). We also 
performed pathway analysis. We identified a new genetic locus associated with CRP levels upstream of ARG1 gene (top 
significant SNP: rs9375813, P meta = 2.85 x 1 0 -8 ), which encodes a key enzyme of the urea cycle counteract the effects of nitric 
oxide, in addition to known CRP (rs7553007, P meta = 1.72x10" 16 ) and HNF1A loci (rs2259816, P meta = 2.90x10" 10 ). When we 
evaluated the associations between the CRP-related SNPs with cardiovascular disease phenotypes, rs9375813 {ARG1) 
showed a marginal association with hypertension (P = 0.0440). To identify more variants and pathways, we performed 
pathway analysis and identified six candidate pathways comprised of genes related to inflammatory processes and CVDs 
{CRP, HNF1A, PCSK6, CD36, and ABCA1). In addition to the previously reported loci {CRP, HNF1A, and IL6) in diverse ethnic 
groups, we identified novel variants in the ARG1 locus associated with CRP levels in Korean population and a number of 
interesting genes related to inflammatory processes and CVD through pathway analysis. 
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Introduction 

C-reactive protein (CRP) is an acute phase reactant protein and 
a general marker of systemic inflammation that is produced by the 
liver. High CRP levels are known to be associated with 
cardiovascular disease (CVD) risk factors, including hypertension, 
coronary heart disease (CHD), and stroke, in addition to 
traditional risk factors such as BMI, smoking, diabetes, and 
cholesterol levels [1]. 

The heritability of CRP levels is estimated to be 25% to 40%, 
indicating that genetic variations can affect inter-individual or 
inter-ethnic group differences in CRP levels [2]. Indeed, CRP 
levels vary significantly among different ethnic groups [3]. For 
example, serum CRP levels were reported to be relatively lower in 
East Asians compared to Europeans, South Asians, and Aboriginal 
peoples in Canada [3]. Several large-scale genome-wide associa- 
tion studies (GWAS) to identify genetic links to difference in CRP 
levels have been undertaken; however, most of these studies were 
performed in European populations [4,5] . In addition to the well- 
known variants that correlate with CRP levels in Europeans such 



as CRP, HNF1A (hepatic nuclear factor 1 -alpha), and APOE 
(apolipoprotein E), some recent GWASs have identified new 
variants such as IL6 (interleukin-6) in the Japanese population and 
TREA12 (triggering receptors expressed by myeloid cells 2) in 
African American women [6,7]. Differences in allele frequencies, 
linkage disequilibrium (LD), effect size, and biological adaptations 
may influence the identification of variants in different ethnic 
groups [8]. 

In spite of the identification of these CRP-associated singie- 
nucleotide polymorphisms (SNPs) and genetic loci by large-scale 
GWASs, the genetic contributions to differences in CRP levels still 
need further investigation. Biological pathway-based analyses may 
be able to obtain more meaningful information from high- 
throughput whole genome data [8]. Pathway analysis can even 
suggest candidate variants that might be missed in a classical 
GWAS approach [9]. In the present study, we used a combined 
approach of GWAS and pathway analysis and attempted to 
identify SNPs associated with CRP levels in the Korean 
population. 
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Materials and Methods 

Study subjects for stage 1 and stage 2 GWAS 

As for the stage 1 subjects, a total of 7,626 individuals were used 
who have been participated in the Korea Association Resource 
project (KARE, stage I) [10]. Half of the subjects were recruited 
from one urban community (Ansan) and the other half came from 
one rural community (Ansung), Gyeonggi province, Korea. As for 
the stage 2 subjects, 903 independent samples were obtained from 
Yonsei University in Korea, which were genotyped with the same 
Affymetrix Genome-Wide Human SNP Array 5.0 platform. The 
general characteristics of stage 1 and stage 2 subjects including the 
CRP levels are summarized in Table SI in File SI. 

Genotyping and quality control of the study population 
for stage 1 GWAS 

The discovery subjects were genotyped using the Affymetrix 
Genome-Wide Human SNP Array 5.0. Genotypes were called 
using the BRLMM algorithm (http://media.affymetrix.com/ 
support/ technical/ whitepapers/brlmm_whitepaper.pdf). Quality 
control was performed according to the previous studies [10,11]. 
We adjusted the individual data with discordant sex information 
and filtered out those with the higher than 3% genotype failure 
rate. We also excluded individuals with the heterozygosity rate 
more than 3 standard deviations away from the mean. We applied 
SNP imputation to increase the coverage of variants by capturing 
additional association signals. Imputed SNP data were obtained 
from Cho et al. [10] who generated them using IMPUTE software 
[12] based on JPT/CHB data of HapMap as a reference panel. 
Based on these imputed SNP genotypes, we used standard quality 
control parameters such as SNP call rate >95%, minor allele 
frequency (MAF) >5% and Hardy- Weinberg equilibrium P> 
0.001. We excluded individuals with a CRP level greater than 
10 mg/dL for our linear regression analysis. We also excluded 
participants with missing CRP levels and with diabetes according 
to the previous study [4] . Population stratification analysis of the 
phase 1 (KARE) data was already performed by Cho et al. [10] 
using principal component analysis and multidimensional scaling, 
in which no population stratification was observed. Through this 
quality control process, genotypes of 7,626 individuals for 
1,219,546 autosomal SNPs were used for association analysis of 
the stage 1 cohorts. 

Stage 2 GWAS with the independent samples 

To confirm the association of the identified SNPs from the stage 
1 data, we analyzed an additional data set of 979 Korean 
individuals from Seoul area of South Korea, part of the Korean 
Metabolic Syndrome Research Initiative study [13]. We applied 
the same filtration process described above and selected 903 
individuals whose CRP level and covariate information are 
available for subsequent analyses. 

Ethics statement 

The local Ethics Committee approved this study, and written 
informed consent was obtained from all patients.This study was 
approved by the Institutional Review Board of the Catholic 
University of Korea School of Medicine (CUMC07U047). 

Statistical analysis 

Linear regression analysis was performed assuming an additive 
model to determine the association of variants with CRP levels. 
Information on CRP levels and covariates were ascertained from 
KARE. CRP concentrations were transformed using the natural 



logarithm function to ensure a normal distribution. Linear 
regression analysis was performed assuming an additive model. 
Among the clinical covariates such as age, sex, body mass index, 
smoking, drinking, high-density lipoprotein (HDL) cholesterol, 
triglycerides, waist circumference, fasting glucose, average pulse, 
systolic blood pressure, diastolic blood pressure, and history of type 
2 diabetes applied in the previous studies of CRP [14], we applied 
age, sex, body mass index, and average pulse for the regression 
analysis which are available in most individuals. We combined the 
stage 1 and 2 results by inverse-variance meta-analysis under the 
assumption of fixed effects. Statistical analyses were performed 
using PLINK [15]. We used Haploview (version 1.4) to create 
Manhattan plots and to calculate LD using a default distance 
option of 500 kb [16]. SNAP software was used to annotate the 
proxy of the top SNP [17]. In SNAP, a regional association plot 
was drawn with the following options: 1000 Genomes Pilot 1 SNP 
data set, CHB andJPT population panel, r 2 threshold of 0.8, and a 
distance limit between the query SNP and the proxy SNP of 
500 kb. The statistical power of the study was evaluated using 
QUANTO version 1.2.4 [18]. In QUANTO, a gene only 
hypothesis was applied with continuous outcome from indepen- 
dent individuals using a desired type I error rate of 0.05 in 2-tailed 
test. The study had 90% power to detect association of a variant 
with a MAF = 0.02 and an effect size of 0. 1 for additive model. 

SNP prioritization 

SNP prioritization was performed via GWASrap (http:// 
jjwanglab.org/gwasrap) [19]. This tool generates a re-prioritized 
genetic variant list by combining the original statistical value and 
variant prioritization score. The 13,345 GWAS SNPs with P< 
0.01 were applied as input values. 

Pathway and network analysis 

We used ICSNPathway software to do pathway analysis and to 
identify candidate SNPs from our GWAS [20]. To get more 
reliable pathway analysis, we also used GSA-SNP software [21]. In 
ICSNPathway software, we chose the following options as follows; 
500 kb up and downstream of the gene as a rule of mapping SNPs 
to the genes; a threshold of P< 1 0 ' 5 for extracting SNPs from 
GWAS; HapMap Han Chinese in Beijing (CHB) data was used for 
the imputation of HapMap population; 200 kb for distance for 
calculating LD. The rest of options were set as default in 
ICSNPathway software. As the database of pathways, we chose, 
Gene Ontology (GO) database. Pathways with false discovery rate 
(FDR) <0.05 and nominal P<0.05 were considered to be 
associated with CRP levels. In GSA-SNP software, based on the 
unimputed SNP P-values from GWAS, we applied GSA-SNP with 
the default parameters: k-th best SNP as 2, SNP-Gene mapping 
with hg 18, padding with ±20,000 bases, and gene count range of 
above 5 and less than 100. For pathway database, we applied the 
same GO database both in ICSNPathway and GSA-SNP. We 
applied Fisher Statistics to combine the nominal P values from 
ICSNPathway and GSA-SNP Software to identify pathways that 
show consistent significance by both methods [22]. To analyze and 
visualize the pathways identified in the GWAS, GeneMANIA 
software was used [23]. In GeneMANIA, we chose the options as 
follows: automatically selected weighting method for network 
weighting; twenty genes for the number of results to be displayed. 
The rest of options were set as default in the software. 
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Results 

Genome-wide associations with CRP levels and their 
replication 

The general characteristics of stage 1 and stage 2 subjects are 
summarized in Table SI in File SI. The stage 1 set consisted of 
7,626 unrelated Korean subjects (3,586 men and 4,040 women) 
and the stage 2 set consisted of 903 Korean individuals (518 male 
and 385 female). The mean ages of the stage 1 and 2 subjects were 
52.5±8.6 and 41.8±8.6 years, respectively. The mean values of 
HDL, triglycerides and fasting glucose levels of phase 1 
participants were 44.9+10.1, 160.2± 102.2, and 85.5±15.8, 
respectively. The GWASs for CRP levels in Korean individuals 
were performed with imputed SNPs using HapMap II data. The 
overall results of the GWAS analyses with the additive model are 
shown as a Manhattan plot (Figure 1) and as a quantile-quantile 
plot (Figure SI in File SI). The genomic control inflation factor 
(Xqq) was 1 .0, indicating no evidence of type 1 error inflation. 

Eighteen SNPs in the CRP locus and one SNP in the HJVF1A 
locus were below the traditional genome-wide significance 
criterion (5xlO -!! ) (Figure 1 and Table S2 in File SI). We applied 
a less stringent criterion (P< 1x10 j to select SNPs for phase 2 
study. Ninety-eight SNPs in eight loci passed the threshold and the 
CRP levels were largely correlated with the genotypes of each SNP 
(Table S2 in File SI). They included 26 SNPs in the CRP locus 
(lq23.2) (top significant SNP: rs7553007, P= 7.34xl0" lfi ), 17 in 
the HNF1A locus (12q24.31) (top significant SNP rsl 169310, 
P=4.95xl0~ 8 ), 40 in the 6q23.2 locus near ARG1 (arginase 1) 
(top significant SNP: rs2608951, P= 1.96xl0~ 7 ), two in the 
SNCAIP locus (5q23.2) (top significant SNP rsl841972, 
P=4.87xl0~ 6 ), one in the EFNA5 (5q21.3) (SNP rsl2517578, 
P=6.98xl0" 6 ), five in the TNFRSF1 IB (8q24.12) (top significant 
SNP rs2062375, P= 5.22 x 10" 6 ), five in the ARHGAP12 



(10pll.22) (top significant SNP rs796126, P= 1.76 x 10" 6 ), and 
one in the TNFSF11 locus (13ql4.11) (top significant SNP 
rsl 7596685, P=6.69xl0" 6 ). 

To validate the association of the 98 SNPs identified by GWAS, 
we examined the available 92 SNPs in an independent stage 2 set 
of 903 Korean individuals (significance criteria of P<0.05 for stage 
2). Among them, SNPs in the CRP, HNF1A, and ARG1 loci were 
found to be consistently significant (Table S3 in File SI). However, 
none of the SNPs in the chromosome 5, 8, 10 and 13 loci were 
significant in the stage 2 set. In a subsequent meta-analysis of the 
stage 1 and stage 2 results, all three loci were more strongly 
associated with CRP levels than in stage 1 and reached the 
traditional genome- wide significance criterion (5xl0~ ): 19 SNPs 
in the CRP locus (lq23.2) (most significant SNP: rs7553007, 
P meta = 1.72x10" l % 17 SNPs in the HMF1A locus (12q24.31) 
(most significant SNP: rs2393791, P meta = 2.90x 10" 10 ), and 21 
SNPs in the ARG1 locus (6q23.2) (most significant SNP: 
rs9375813, P metil = 2.85 x 10~ 8 ) (Table 1). Details are available in 
Table S3 in File SI. The results from stage 1, stage 2, and meta- 
analysis together indicate that the three loci (CRP in lq23.2, 
HNF1A in 12q24.31 and ARG1 in 6q23.2) were consistently 
significant (Table S2 and S3 in File SI). 

Among these three significant loci, the CRP and HJVF1A loci are 
known to be associated with CRP levels [4,24]; however, the ARG1 
locus has not been reported as being associated with CRP levels. 
In the ARG1 locus, 22 neighboring SNPs of the rs9375813 cluster 
in about a 100 kb upstream region of the ARG1 gene were all in 
strong LD with one another (Figure 2). The P-values and the LD 
values of the neighboring SNPs are summarized in Table S4 in 
File SI. Similarly, neighboring SNPs of rs7553007 in the CRP 
locus and those of rs2393791 in the HNF1A locus were well 
clustered with strong LD. The regional associations and LD plots 
of these loci are shown in Figure S2 in File SI. 




17 19 21 

Chromosome 

Figure 1. Manhattan plot showing GWAS results for serum CRP levels in 7,626 Korean subjects. The blue horizontal line (P- 10 8 ) 
denotes the general threshold for genome-wide significance. The red horizontal line (P<10~ 5 ) denotes the threshold for selecting loci for stage 2 
test. The arrow heads indicate three significant loci that passed the threshold. 
doi:1 0.1 371 /journal.pone.0095866.g001 
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Replication of previously identified SNPs in GWAS 

We observed whether the significant loci previously reported as 
relevant to CRP levels in European and Japanese populations, 
including CRP, HJVF1A, IL6R, GCKR, IL6, and APOE-CI-CU 
cluster [4,6-7] were replicated in our study. Six of the nine loci 
were found to be replicated in our study (Table 2). Details of all the 
SNPs in the nine loci are available in Table S5 in File SI. 
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SNP prioritization 

We performed SNP prioritization analysis to identify SNPs with 
mediocre p-values, but with potential for high impact using 
GWASrap tool [19]. Most top rank SNPs still remained significant 
after SNP prioritization (Table S6 in File SI). However, for three 
SNPs, rs2608912, rs2608976, rs2608921, their ranking of signif- 
icance levels became highly elevated after SNP prioritization from 
87th, 88th and 81th to 16th, 17th and 19th, respectively. 
Interestingly, the three SNPs are located about 1 7 — 25 kb 
upstream of ARG1 gene and in perfect LD among themselves, 
and also in moderate LD with rs9375813 (D' =0.671) (Figure S3 
in File SI). 
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Associations of the CRP-associated SNPs with 
cardiovascular phenotypes 

We conducted logistic regression analysis to evaluate the 
associations between CRP-related SNPs with disease phenotypes 
such as CHD (n = 65), myocardial infarction (MI; n = 55), and 
hypertension (n= 1,115). The SNP in^G7 (rs9375813) showed a 
marginal association with hypertension (P= 0.0440). The other 
two SNPs in CRP and HNF1A did not show any significant 
associations with any of the traits (Table S7 in File SI). 
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Pathway analysis 

In addition to GWAS, we performed a pathway analysis to 
identify more variants and pathways that may influence CRP 
levels. To minimize the potential bias of any single algorithm, we 
chose two pathway analysis algorithms to ensure the validity of the 
identified pathways. We used ICSNPathway software to identify 
candidate SNPs and mechanisms that contribute to CRP level and 
to generate pathway hypotheses. In addition, we reconfirmed the 
pathways using GSA-SNP software. 

We used unimputed Phase 1 GWAS P values for pathway 
analysis and identified four candidate SNPs in six pathways 
(nominal P<0.001 and FDR<0.001, Table 3 and Table 4) using 
ICSNPathway: rsl205 in CRP, rs2464196 and rs2464195 in 
HNFIA, and rs 1635498 in EXOl. Among these, SNPs in the CRP 
and HNFIA loci were also identified through GWAS, while a SNP 
in the EXOl gene was exclusively identified through pathway 
analysis. Among the four variants, three were non-synonymous 
variants and one was in the regulatory region (Table 3). Although 
these probes were not present in the Affymetrix SNP 5.0 arrays, all 
were in strong LD with the SNPs represented in the genotyped 
data (r 2 ranged from 0.92 to 1.0). 

The six pathways identified by ICSNPathway provided six 
hypothetical biological mechanisms, including the adaptive 
immune response, leukocyte mediated immunity, photoreceptor 
outer segment, cell-surface binding, cholesterol binding, and 
bacterial cell surface binding (Table 4). In addition to the top- 
ranked four candidate variants in the three genes, other genes such 
as TLR4, C9, CD36, ABCG1, and ABCA1, which are known to be 
related to inflammatory processes, are also involved in these 
pathways. Detailed information about each pathway is available in 
Tables S8 — S13 in File SI. Of the six pathways identified by 
ICSNPathway, four overlapped with the pathways defined by 
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Figure 2. Regional plot of the SNPs in the ARG1 locus (up) and the LD relationship among these SNPs (down). Data are shown for the 
ARG1 locus around rs9375813. Diamond-shaped dots represent -log q0 (P-values) of SNPs, and green diamond in the LD plot indicates the most 
significant SNP. The strength of LD relationship (r 2 ) between the most strongly associated SNP and the other SNPs is presented with red color 
intensities based on JPT+CHB HapMap data. The light blue curve shows recombination rates drawn based on JPT+CHB HapMap data. Green bars 
represent the coding genes in this region. 
doi:1 0.1 371 /journal.pone.0095866.g002 



GSA-SNP, suggesting the reliability of our pathway analysis 
(Table 4). Details of the GSA-SNP analysis are available in Table 
S14 in File SI. When we applied Fisher's method to combine the 
nominal P values of ICSNPathway and GSA-SNP, all four 
pathways showed consistent significance (P<0.001) (Table S15 in 
File SI). 

Network analysis of identified pathways 

To analyze and visualize the pathways identified in GWAS, 
GeneMANIA network analysis was performed. Several new genes 
and gene networks were discovered through the analysis of each 
pathway. Details of each pathway are available in Figures S4— S7 
in File SI. 

Discussion 

We applied the combined methods of GWAS and pathway 
analysis to unravel the genetic polymorphisms associated with 
CRP levels in 8,529 Korean individuals. Although GWAS has 
become the standard approach for the investigation of associations 
between common variants and susceptibility to complex diseases 
[25], a certain amount of biologically meaningful markers and 
genes can be missed because of the stringent statistical threshold 



applied to minimize false-positive findings [9]. Pathway analysis 
can complement the GWAS approach in estimating genetic 
susceptibility to complex diseases like cardiovascular disease and 
type-2 diabetes through evaluating the cumulative effects of 
functionally related genes [26]. By combining GWAS and 
pathway analysis, we identified both well-known and novel genetic 
variants associated with CRP levels. 

Through independent two-stage GWAS and meta-analysis, 
three loci (CRP in lq23.2, HJVF1A in 12q24.31, and ARG1 in 
6q23.2) were found to be consistendy significant and satisfied the 
traditional genome-wide significance criterion (5xl0 -8 ). Other 
than these three loci, we also identified variants in the EFNA5, 
TNFRSF1 IB, and C12orf43 loci. Although none were significant in 
stage 2 testing nor reached the traditional level of significance in 
the meta analysis, they are known to be related to the development 
of CHD [27]. 

The variants that showed the strongest associations were located 
in and around the CRP locus. This is consistent with previous 
GWASs that indicated a strong association of variants in the CRP 
and HNF1A loci with CRP levels in people of European, Asian, 
and African American ancestries [4,6,28]. The second most 
significant variant in our study was in the FLNF1A locus. Recently, 
Kong et al. reported the association of a HMF1A polymorphism 
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(rs2393791) with CRP levels and other phenotypes such as 
arthritis, tuberculosis, and y-GTP in Korean individuals [29]. 
HNF1A binds to the CRP promoter and is involved in the 
regulation of CRP [5]. 

The most notable finding in this study was a significant 
association of rs9375813 near the ARG1 gene with CRP levels in 
the Korean population. This newly identified variant in the 6q23.2 
chromosomal region is located approximately 1 00 kb upstream of 
ARG1 and 150 kb downstream of AKAP7. The LD block, where 
rs9375813 is located, extends into ARG1 but not into AKAP7. In 
addition, three SNPs located about 1 7 — 25 kb upstream ofARGl 
gene were also found to be significantly associated with the CRP 
level in SNP prioritization analysis and in perfect LD among 
themselves, and also in moderate LD with rs9375813. All these 
data suggest that ARG1 is related to the CRP level. Arginase is one 
of the enzymes of the urea cycle in the liver and is critically 
involved in various aspects of inflammation [30]. Although an 
association between ARG1 polymorphisms and the level of CRP 
has not been reported, associations of ARG1 polymorphism with 
CVD and asthma have been reported [31,32]. It is well-known 
that arginase counteracts nitric oxide (NO) synthase and interferes 
with beneficial NO-mediated effects, including vasodilation, 
decreased vascular smooth muscle cell proliferation, decreased 
interaction between white blood cells and the vascular endothe- 
lium, and decreased platelet aggregation [33]. Regarding the 
relationship between arginase and CRP levels, Bekpinar et al. 
reported that the level of arginase was inversely correlated with 
that of hsCRP [34]. Moreover, ARG1 mRNA levels are reported 
to be positively associated with the up-regulation of soluble 
intercellular adhesion molecule- 1 , which is a circulating biomarker 
for endothelial dysfunction [35]. Combining the results from 
previous reports along with our data presented here, we 
hypothesize that ARG1 polymorphisms or pathways might play a 
role in CRP level variation and cardiovascular traits. 

In this study, ARG1 polymorphisms, including the A allele of 
rs9375813, were associated with lower CRP levels. To explore the 
possibility of whether these ARG1 polymorphisms may be 
associated with a lower risk of CVD, we evaluated the association 
of ARG1 SNPs with a history of MI, CHD, and hypertension in the 
discovery subjects. In our logistic regression analysis of the top 
significant SNPs of CRP, HNF1A, and ARG1 loci with cardiovas- 
cular phenotypes, the rs9375813 in ARG1 showed a marginally 
significant association with hypertension (P<0.044), however, the 
other SNPs did not show any significant associations with any of 
the traits. This result is in agreement with a report by Elliott et al. 
that found no association of variants in the CRP locus and CVD in 
a Mendelian randomization study of more than 28,000 cases and 
100,000 controls [24]. However, we cannot exclude the possibility 
that an effect may not be detected due to limitations in CRP 
measurement itself, including the cross-sectional nature of the 
measurement and the limited information available on confound- 
ing variables such as medication history or the presence of active 
inflammation at the time of blood sampling. Also, taking the 
relatively low prevalence of the other phenotypes in our study 
samples (55 MI and 65 CHD out of 7,626 subjects) into 
consideration, further analysis with more cases may help to 
evlauate the association more conclusively between the CRP level- 
associated SNPs and cardiovascular diseases. In spite of the 
limitations described above, it's worth noting that our study 
population is largely disease free population which has already 
been used in large-scale GWASs of similar traits [36,10]. 
Moreover, we have removed the individuals with diabetes mellitus 
who may have increased level of inflammation [37,38] from our 
study. The rs9375813 MAF varies widely between ethnicities: 0.09 
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in Europeans (HapMap CEPH), 0.20 in Africans (HapMap YRI), 
and 0.15 in Asians (HapMap HCB and JPT). Considering that 
most of the large-scale GWASs for CRP levels have been 
performed in Europeans [4,5], the relatively lower MAF in 
European people might be one of the reasons why this locus has 
not been identified in earlier studies. Asians and Africans seem to 
have relatively higher MAFs than Europeans, but the LD 
structures are very different between them (Figure S3 in File SI). 
At the present time, it remains unclear whether the association 
between rs9375813 and CRP levels is Asian-specific; further 
studies in diverse ethnic groups will be required to clarify this issue. 

In this study, six of the nine significant loci reported in previous 
GWASs on CRP levels in diverse ethnic groups were replicated in 
our Korean population study (Table 2). This result suggests that 
these SNPs may be universally linked to CRP levels in human 
beings. Among the replicated SNP loci, those in the IL6R and IL6 
genes showed the same directional effect with ours [4,6] while 
those in GCKR and HMF1A presented the opposite directional 
effect to ours [4]. Interestingly, the directional effect of rs7553007 
in CRP differed between studies: A study with Hispanic American 
individuals showed similar direction results as presented here [7], 
but other studies with Europeans and West Africans showed the 
opposite direction [4,28]. In addition, rs 107 782 13 in the ASCL1 
gene, which was identified in American women, was not found in 
a Japanese population (P= 0.54) nor in this study (P= 0.06). 

To identify more reliable pathways and minimize false positive 
findings, we used two different software packages, ICSNPathway 
and GSA-SNP. In both softwares, using a full list of GWAS SNP P- 
values is desirable. ICSNPathway selects the best -log P value, 
while in GSA-SNP, user has the option of selecting the best or 
second best SNP within a gene boundary to be assign to the gene. 
ICSNPathway compares the distribution of the member gene 
scores of a gene set to all the genes using Kolmogor-Smirnov like 
running-sum statistics. Variation of the number of member genes 
among gene sets is taken care of by multiplying it to factor ml/m2, 
where ml is the proportion of significant genes defined as genes 
mapped with at least one of the top 5% most significant SNPs of 
all SNPs in GWAS for pathways and m2 is the proportion of 
significant genes for all the genes in the GWAS [20] . On the other 
hand, in GSA-SNP, the scores of its member genes are averaged 
for each gene set and significance is estimated using Z-statistics of 
these scores [22]. FDR is computed for multiple testing 
corrections. 

Four out of six pathways from ICSNPathway analysis 
overlapped with pathways identified by GSA-SNP at the FDR 
cut off of <0.001 (Table 4), indicating reliability of the identified 
pathways. When we applied Fisher Statistics to combine the 
nominal P values of ICSNPathway and GSA-SNP, all four 
pathways showed consistent significance (P<0.001). However, a 
number of the pathways were identified only by GSA-SNP which 
suggests that there is a possibility of false positives in the pathway 
analysis results and cross validation will help to rule out them. The 
significantly enriched genes in these pathways, such as CRP, 
HMF1A, PCSK6, CD36, and ABCA1, have a link either to 
inflammation or CVD. The top significant SNP in this study, 
CRP, has also been reported as a top significant gene associated 
with CRP levels in almost all GWASs reported so far [4,7]. In 
addition to CRP, CD36 is one of the key genes enriched in the cell 
surface binding pathway, and has been reported to be associated 
with inflammation-mediated diseases such as atherosclerosis [39]. 
A number of studies have suggested a plausible mechanism that 
may link genes enriched in the cholesterol binding pathway to 
inflammation or to the etiology of atherosclerosis [40] . Indeed, the 
key genes enriched in the cholesterol binding pathway have links 



to cholesterol efflux (ABCG1), inflammation, and atherosclerosis 
(ABCB1 and APOA2) [41]. These results strongly suggest that the 
genes enriched in the cholesterol binding and cell surface binding 
pathways are involved in the regulation of inflammation, which is 
linked to CRP levels and may be involved in atherosclerosis 
pathogenesis. Although the term photoreceptor outer segment has 
no biological relevance to CRP or inflammation, the genes 
enriched in this pathway have biological relevance to CRP 
(HNF1A) [5], high triglycerides {PCDH15) [42] and metabolic 
syndrome (GNAT3) [43]. Likewise, the genes enriched in bacterial 
cell surface binding pathway have relevance to the CRP level 
{CRP), atherosclerosis [CD36] and inflammation (PCSKG) [44-46]. 
Two pathways identified by ICSNPathway, the adaptive immune 
response and leukocyte mediated immunity pathways, did not 
overlap with pathways identified by GSA-SNP. Although not 
overlapped in both software packages, genes enriched in the two 
pathways have a putative connection with inflammation and 
CVDs [47,48]. In addition, some pathways identified only by 
GSA-SNP have potential implications in the inflammation and 
CVD pathways. GSA-SNP identified 116 pathways at the FDR 
cutoff of <0.001 in this study (Table S14 in File SI), with the top- 
ranked pathway identified as low-density lipoprotein particle 
binding. The top enriched genes in this pathway were CRP, 
CDH13, STAB2, THBS1, and SORL1. CDH13 variants have been 
reported to be associated with hypertension [49] . It is worth noting 
that arginase II was enriched in cellular response to the interferon 
gamma pathway in GSA-SNP analysis (Table S12 in File SI). 
Although arginase I and arginase II are localized differently, both 
isoforms catalyze the hydrolysis of L-arginine to L-ornithine and 
urea [50] . This result is additional supporting evidence that ARG1 
polymorphism or pathways related with ARG1 might play a role in 
CRP level variation and cardiovascular traits. 

To analyze and visualize the pathways identified in the GWAS, 
GeneMANIA network analysis was performed. This analysis can 
help to find new genes that have phenotypic relations with query 
gene, which will be useful for selecting the candidates for further 
functional study. We also examined whether the enriched genes in 
the network analysis were reported as being associated with 
immune response or CVD. In the photoreceptor outer segment 
pathway, some of the newly identified genes such as RAP1A and 
CACNA1C have a connection with inflammation [51,52]. Inter- 
estingly, CACNA1C has reported to be enriched in pathway 
analysis of inflammatory conditions such as Crohn's disease 
[51,52]. Among the newly identified genes in that pathway, 
PDE6H and MYL6B had higher weight in the network (Figure S4 
in File SI). Some of the newly identified genes in the cholesterol 
binding pathway, SHH, APOF and APOC1, were also associated 
with cholesterol transport or CVD [53,54]. Among them, DHH, 
SHH and PTCH2 had higher weight in the network (Figure S6 in 
File SI). The key members of the cell surface binding and bacterial 
cell surface binding networks such as TLR6, GDI 4, CD244, CD58, 
SCARB2, and SCARB1 have interconnected roles in inflammatory 
responses and CVD [55,56]. Among them, SCARB1 and SCARB2 
had higher weight in the bacterial cell surface binding network 
(Figure S7 in File SI). Especially, SCARB1 plays a vital role in 
reverse cholesterol transport and is also involved in the removal of 
cholesterol [57]. Taken together, through GeneMANIA network 
analysis of the identified pathways, we were able to identify more 
genes related to inflammation or CVD. 

Overall, some CRP-associated polymorphisms enriched in the 
pathways are involved in the inflammation or pathogenesis of 
CVDs. Although we did not explore the biological effects of 
polymorphisms identified through pathway analysis, these genes 
and pathways may help to generate hypotheses for further 
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functional studies investigating the inter-individual differences in 
CRP levels and CVD risk. 

Our study has several limitations. First, the nature of serum 
CRP as an acute reactant itself can be a limitation. The baseline 
CRP level in each individual should be estimated based on 
repetitive measurements, preferably in the absence of acute 
inflammatory conditions, which we could not achieve in this 
study. Second, the sample size of the replication set may not be 
large enough to verify the potential associations with CRP levels. 
To confirm the associations we identified, replication in a large 
meta-analysis of CRP GWAS specifically in Asian population will 
be required. After that, meaningful functional studies on the SNP 
shoud follow. 

In conclusion, by combining GWAS, pathway and gene 
network analysis, we identified novel ARG1 variants and a number 
of interesting candidate genes related to inflammatory processes 
and CVDs such as CRP, HMF1A, PCSK6, CD36, and ABCA1 in the 
Korean population. Our results also strongly corroborate the 
previously reported loci (CRP, HNF1A, IL6) known to be 
associated with CRP levels in diverse ethnic groups. This study 
highlights the effectiveness of combining GWAS and pathway 
analysis in identifying new genetic variants in meaningful 
pathways, which can improve our understanding of the genetic 
mechanisms behind variations in CRP levels. 

Supporting Information 

File SI Figure SI, A Quantile-Quantile plot of .P-values in the 
GWAS for serum CRP levels (Stage 1). The horizontal axis 
indicates the expected -logio (P-values). The vertical axis indicates 
the observed -logio (P-values). The red line represents y = x. 
Figure S2, (A) Stage 1 data showing a regional association (upper 
panel) and linkage disequilibrium (LD; lower panel) plots of the 
CRP locus around rs7553007. Arrow head represents rs7553007. 
(B) Stage 1 data showing a regional association (upper panel) and 
linkage disequilibrium (LD; lower panel) plots of the HNF1A locus 
around rs2393791. Arrow head represents rs2393791. Figure S3, 
Pairwise linkage disequilibrium (LD) between the selected SNPs in 
ARG1 locus around rs9375813. LD plots for Korea population 
were drawn using the genotype data from the present study, 
whereas LD plots for Japanese, Chinese, Europeans and Africans 
were made from genotype data from HapMap Stage 2. Blue ID 
indicates the most significant SNP. Purple ID indicates the SNPs 
whose rank got elevated after re-prioritization. Figure S4, Gene 
network of photoreceptor outer segment pathway by GeneMA- 
NIA analysis. Using the genes identified from pathway analysis, 
GeneMANIA network analysis was performed. Query genes are 
depicted as black nodes and discovered genes are depicted as gray 
nodes. Edges show different interactions among genes; purple 
indicates for co-expression; light-blue indicates for pathway; dark 
yellow indicates for shared protein domains; red indicates for 
physical interactions; dark blue indicates for co-localization; green 
indicates for genetic interactions. Node sizes are determined 



according to their weight in the network. Figure S5, Gene 
network of cell surface binding pathway by GeneMANIA. Figure 
S6, Gene network of cholesterol binding pathway by GeneMA- 
NIA. Figure S7, Gene network of bacterial cell surface binding 
pathway by GeneMANIA. Table SI, Characteristics of the 
subjects in stage 1 and 2 data. The stage 1 consists of 7,626 
unrelated Korean subjects (3,586 men and 4,040 women) and the 
stage 2 consists of 903 Korean individuals (518 male and 385 
female). Table S2, SNP loci associated with serum CRP levels in 
the stage 1 data. Table S3, SNP loci associated with serum CRP 
levels in the stage 2 data and meta-analysis. Table S4, P-values 
and LD values of neighboring SNPs of the rs9375813 in the ARG1 
locus. Table S5, Associations of the previously reported CRP- 
related loci. Based on genotyped and imputed SNP data, we 
observed the associations of previously reported CRP-associated 
loci. Table S6, Re-prioritized genetic variants list after GWAS. 
Table S7, Associations of the top significant SNP polymorphisms 
on cardiovascular disease traits based on the whole KARE samples 
(7,626 samples). Table S8, Genes mapped with variant in 
adaptive immune response. We performed a pathway analysis to 
identify more variants and pathways that may influence CRP 
levels using ICSNPathway software and reconfirmed the identified 
pathways with GSA-SNP software. This table shows the genes 
mapped with variants in adaptive immune response among the ten 
candidate pathways enriched with CRP-associated SNPs in 
ICSNPathay analysis. Table S9, Genes mapped with variant in 
leukocyte mediated immunity. Table S10, Genes mapped with 
variant in photoreceptor outer segment. Table Sll, Genes 
mapped with variant in cell surface binding. Table S12, Genes 
mapped with variant in cholesterol binding. Table S13, Genes 
mapped with variant in bacterial cell surface binding. Table S14, 
Pathway analysis results of GSA-SNP software. Table S15, 
Combined P value estimation of ICSNPathway and GSA-SNP by 
Fisher Statistics. 
(PDF) 
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